Part of Speech Tagging
General Requirements
| Criteria | Meet Specification |
|---|---|
|
Submission includes all files required for grading |
|
|
Submitted files are complete and do not include any disallowed changes |
Submitted notebook has made no changes to test case assertions |
Baseline Tagger Implementation
| Criteria | Meet Specification |
|---|---|
|
Student correctly implements the
|
Emission count test case assertions all pass.
|
|
Correct baseline MFC tagger implementation |
Baseline MFC tagger passes all test case assertions and produces the expected accuracy using the universal tagset.
|
Calculating Tag Counts
| Criteria | Meet Specification |
|---|---|
|
Correct
|
All unigram test case assertions pass |
|
Correct
|
All bigram test case assertions pass |
|
Correct
|
All start and end count test case assertions pass |
Basic HMM Tagger Implementation
| Criteria | Meet Specification |
|---|---|
|
Correct HMM network construction |
All model topology test case assertions pass |
|
Correct basic HMM tagger implementation |
Basic HMM tagger passes all assertion test cases and produces the expected accuracy using the universal tagset.
|
Tips to make your project standout:
Students may run their taggers on more complex datasets (for example, the
nltk.corpus.brown
or
nltk.corpus.treebank
datasets).
Students may also try more advanced HMMs:
- Using pseudocounts or interpolated smoothing to handle missing data
-
Retrain the hidden markov model using Baum-Welch re-estimation (available via the
.fit()method in Pomegranate)